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The foundational theory of differentiation was developed as part of the original release of ACL2(r). 
In work reported at the last ACL2 Workshop, we presented theorems justifying the usual differen- 
tiation rules, including the chain rule and the derivative of inverse functions. However, the process 
of applying these theorems to formalize the derivative of a particular function is completely manual. 
More recently, we developed a macro and supporting functions that can automate this process. This 
macro uses the ACL2 table facility to keep track of functions and their derivatives, and it also inter- 
acts with the macro that introduces inverse functions in ACL2(r), so that their derivatives can also be 
automated. In this paper, we present the implementation of this macro and related functions. 

1 Introduction 

This paper describes the implementation of an automatic differentiator (AD) IIRGllll that can find and 
prove the derivative of algebraic expressions in ACL2(r). The tool is accessed through the macros 
def derivative and derivative-hyps. We will describe these macros more fully later, but for now, 
we introduce them with an example. 

Suppose we have defined the function square that computes x-x. The following event determines 
the derivative of the function square and proves the associated theorems: 

(def derivative square-deriv-local (square x) ) 

The key theorem that the macro def derivative introduces is as follows: 

(defthm square-deriv-local 

(implies (and (acl2-numberp x) 



(acl2-numberp y) 
(standardp x) 
(i-close X y) 
(not (equal x y))) 
(i-close (/ (- (square x) 
(square y)) 



: hints . . . ) 

The conclusion of this theorem states that the derivative of square is x-\+x- \, which of course sim- 
plifies to the expected value of 2x. The macro does not automatically perform such simplifications, but 
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the user can easily provide the preferred form for the derivative functioqj For example, the function 
square-prime can be defined as (* 2 x), and the derivative theorems for it can be proved using the 
macro derivative-hyps: 

(derivative-hyps square 
: close-hints 
(("Goal" 

:use ((: instance square-deriv-local) ) 

: in-theory (disable square-deriv-local)))) 

The macro derivative-hyps introduces all the theorems that establish that the function square-prime 
is in fact the derivative of square. The keyword arguments allow the user to provide hints for some of the 
necessary theorems. In this case, it is necessary to explicitly invoke the theorem square-deriv-local, 
which was previously introduced via def derivative. 

The rest of this paper describes the implementation of these macros. Section [2] describes some differ- 
ences in the theory of differentiation that proved useful in developing the AD macros. In particular, the 
proofs of the algebraic composition theorems differ from the ones described in [GC09J to take advantage 
of the fact the derivative is known in the current context. Section |3] describes the capabilities and limi- 
tations of our AD system. This is followed in Section |4] with a full description of their implementation. 
Finally, Section [5] describes future enhancements to these macros. 

2 The Revised Story of Differentiation in ACL2(r) 

The theory of differentiation that was developed in UGamOOl and IIGC09i is strictly foundational. For 
one thing, the development is concerned more with differentiability than with derivatives. For example, 
the theorem that describes the derivative of sums is stated informally as follows: If / and g are dif- 
ferentiable functions, so is f + g. Notice that no mention is made of the derivatives of /, g, or f + g\ 
Instead, the theorems deal directly with expressions corresponding to the differential of the functions, 
e.g.,Af{x)/Ax={f{x + e)-f{x))/e. 

Using principles from non-standard analysis IIRob961 IRob88l . these derivatives can be introduced 
implicitly by taking standard parts. That is, f'{x) = *{Af{x)/Ax). However, this definition only works 
when X is standard. It can be generalized using def thm-std, but this process is unsatisfactory because 
the relationship between /', the expected derivative of the function /, and the standard part of Af /Ax is 
obscured. This may explain why previous results included many abstract theorems about differentiable 
functions, but few concrete derivatives. For instance, the derivatives of the trigonometric, exponential, 
and logarithmic functions had not been proven in ACL2(r) before this project. 

A more significant challenge is the use of intervals in the formalization of differentiation. Intervals 
were used to define the domain (and sometimes range) of differentiable functions. This corresponds to 
typical mathematical statements. For example, one of the hypotheses of the mean-value theorem (MVT) 
is that / is continuous over an interval [a,b] and differentiable over {a,b), and a user can select suitable 
values of a and b when the theorem is applied manually. But this is harder to do when the theorems are 
applied automatically, if for no other reason than the domain of some functions (such as f{x) = 1 /x) is 
not a simple interval. 

'As we write this, we are working on a version of the macro that lets the user provide this function when the macro is 
introduced. This will simplify the process described here. 
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Consequently, we developed new versions of the differentiability criterion that make explicit use 
of the differentiable function / and its derivative /'. We also redeveloped versions of the composition 
theorems, namely 



• (/ ^)'{x) = 1 //'(/ \x)), where / ^ is the (compositional) inverse of /. 

These new versions represent the domain (and sometimes range) of / explicit via functions instead of 
intervals or some other data structure. The association between the function /, its derivative /' and its 
domain is kept using a naming convention; i.e., f , f -prime, and f-domain-p. 

Using these new formalizations and the associated naming conventions was the key to automating 
the application of the algebraic composition rules first formalized in IIGC09II . However, the new formal- 
ization does have some drawbacks. First, there is no guarantee that the domains used are, or even contain 
non-trivial intervals. This means, for example, that the derivative of \x\ could be vacuously formalized 
on the domain x G {0}. More seriously, however, it prevents the application of the more foundational 
theorems established earlier, such as the MVT We are investigating ways to bridge our current work with 
prior results to remedy this issue. 

The final, significant challenge is that some of the results obtained previously were proven in contexts 
that turned out to be too restrictive. Specifically, important theorems, such as the chain rule, which is 
used repeatedly during automatic differentiation, was developed only for real- valued functions. However, 
the trigonometric functions in ACL2(r), such as sine and cosine, are defined in terms of the complex 
exponential function. For example, sin(;c) = {e'" — e^") /2i. So we developed a new formalization of the 
chain rule, which works for complex numbers. 

3 The Automatic Differentiation Macros 

Previously, we discussed how the macros def derivative and derivative-hyps to introduce the 
derivative of a function. In this section, we will explore these and other related macros more fully. 

The macro derivative-hyps is used to generate automatically the theorems required to show that 
the function f -prime is the derivative of f . The theorems and the associated naming conventions ai^e as 
follows: 

• f-number 

(implies (f-domain-p x) 

(acl2-numberp (f x))) 

• f-standard 

(implies (and (standardp x) 

(f-domain-p x)) 
(standardp (f x))) 



--f'{x)+g'{x). 
f{x)g'{x)+f'{x)g{x). 



• f-continuous 
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(implies (and (f-domain-p x) 
(standardp x) 
(f-domain-p y) 
(i-close X y)) 
(i-close (f x) (f y))) 

• f-prime-number 

(implies (f-domain-p x) 

(acl2-numberp (f -prime x))) 

• f-prime-standard 

(implies (and (standardp x) 

(f-domain-p x)) 
(standardp (f -prime x))) 

• f-prime-continuous 

(implies (and (f-domain-p x) 
(standardp x) 
(f-domain-p y) 
(i-close X y)) 
(i-close (f -prime x) (f -prime y))) 

• f-close 

(implies (and (f-domain-p x) 
(standardp x) 
(f-domain-p y) 
(i-close X y) 
(not (equal x y))) 
(i-close (/ (- (f x) (f y)) 
(- X y)) 
(f -prime x))) 

These theorems are precisely the ones that will be used to establish the constraints of encapsulates 
that are used to encode the composition theorems. The names of the theorems are important, because the 
macros will generate hints with those names. 

Similarly, the macro inverse-hyps generates the theorems that establish that the function f -inverse 
is the compositional inverse of f . 

• f-inverse-in-range 

(implies (f -inverse-domain-p x) 

(f-domain-p (f -inverse x))) 

• f-domain-is-number 

(implies (f-domain-p x) 

(acl2-numberp x)) 

• f-inverse-relation 
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(implies (f -inverse-domain-p x) 

(equal (f (f-inverse x)) x)) 

• f-d/dx-f-relation 

(implies (f-inverse-domain-p x) 

(equal (f -inverse-prime x) 

(/ (f -prime (f-inverse x))))) 

• f-prime-not-zero 

(implies (f-domain-p x) 

(not (equal (f -prime x) 0))) 

• f-preserves-not-close 

(implies (and (f-domain-p x) 
(f-domain-p y) 
(i-limited x) 
(not (i-close x y))) 
(not (i-close (f x) (f y)))) 

The theorems generated by this macro are precisely the constraints needed to establish that the dif- 
ferentiable function / has an inverse. Note, for example, the last two theorems above. The theorem 
f-prime-not-zero ensures that f'{x) 7^ in the domain of /. This is, in fact, one of the hypotheses of the 
theorem that states that (/^^ )'(x) = 1 / f'{f^^{x)), since the expression contains 1 //'(. . .). 

Both def derivative and derivative-hyps can be used in two different contexts. First, our own 
functions and macros use them the generate constraints in various encapsulates. Second, the user can 
use these macros to generate the theorems that correspond to these constraints, thus making sure that 
(a) the constraints will be satisfied, and (b) the theorems satisfy the naming conventions assumed by the 
macros. 

One final point is related to the theorems generated by these macros: These have to be proved by 
ACL2(r). Many times, the proofs succeed automatically, because the macros are careful to use the 
minimal theory required for the proofs to go through, but sometimes ACL2(r) needs a little help. The 
macros use keyword arguments to accept hints that will be passed on to the appropriate generated theo- 
rems. For example, the keyword argument not-close-hints is used to provide a hint to the theorem 
f-preserves-not-close. 

The macro def derivative is the main entry point into the automatic differentiator. It takes two 
arguments, a prefix used to scope the generated theorem names, and the arithmetic expression that is 
to be derived. By "arithmetic expression", we mean an ACL2 term that is composed only of numbers, 
variables, arithmetic operators, and the application of functions with known derivatives or functions that 
are defined in terms of arithmetic expressions or as the inverse of functions that have known derivatives 
or are defined in terms of arithmetic expressions. In particular, recursive functions are not allowed, nor 
are functions that use if or cond. 

What def der i vat i ve does is first to compute symbolically the derivative of the expression, and then 
to generate the lemmas necessary to demonstrate that the derivative computed symbolically is correct. 
Note that the lemmas follow the pattern and naming convention defined by derivative-hyps, so the 
derivatives of deeply nested expressions can be done automatically. 

Finally, the macros def -elem-derivative and def -elem-inverse are used to register functions 
with known derivatives or inverses, respectively. For example, the AD system automatically defines the 
derivative of f{x) = \/x with the following event: 
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(def-elem-derivative 
unary-/ 
elem-unary-/ 
(and (acl2-numberp x) 

(not (equal x 0))) 
(- (/ (* X x)))) 

The arguments to this macro are (1) the name of the function that has a known derivative (in this case, 
unary-/), (2) a prefix used to name all the theorems generated by derivative-hyps, (3) an ACL2 
expression for the domain of the function, and (4) an ACL2 expression for the derivative. Note that 
def-elem-derivative does not prove any theorems. Rather, it registers in a global database that the 
given function has the specified derivative. It is expected that the proofs have been previously generated, 
e.g., using derivative-hyps. 

Similarly, def-elem-inverse is used to register an inverse function. For example, the following 
expression registers the inverse of the function square: 

(def-elem-inverse 
square-inverse 
square-inverse 
(square-domain-p x) 
(square-inverse-domain-p x) 
square) 

The arguments are similar to def-elem-derivative, except both the domain and range (or inverse- 
domain) need to be specified. 

As we mentioned previously, the AD automatically uses def-elem-derivative to register the 
derivatives of f{x) = l/x and f{x) = —x. This explains how subtraction and division are handled, namely 
through the chain rule and those two derivative facts. In addition, we have developed several ACL2 books 
that establish the derivative of more complex functions, such as e'^, Inx, sin(x), sin^^ (x), etc. The deriva- 
tive of was done using first principles, but the others were done using the macros described in this sec- 
tion. The relevant books also register the derivatives of these functions using def-elem-derivative. 
The derivative facts we have established so far are summarized in Figure [T] We note in passing that the 
macros support not just unary functions but binary functions where one argument is held fixed. This 
allows us to differentiate the raise function to find the derivatives of x" and a^. The trick of holding an 
argument fixed is essentially the same one used in [,SG02,| . 

4 Implementing the Macros 

The macros derivative-hyps and inverse-hyps simply generate a progn containing several theo- 
rems. There is not much to their implementation. 

Similarly, def-elem-derivative and def-elem-inverse are nothing more than syntactic sugar 
for ACL2's built-in table facility. 

That leaves the definition of def derivative. In a nutshell, this macro works by repeatedly applying 
the chain rule to an expression, until each of its subexpressions is either a constant, a variable, a function 
with a known derivative, or the inverse of a function. Before describing this macro, we want to make 
a minor point. Obviously, def derivative needs access to ACL2's definition database, so that it can 
expand function applications to compute derivatives. However, access to these definitions depends on 
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acl2-exp 



acl2-ln / acl2-sine 



acl2-sqrt 



unary-/ unary- 



acl2-cosine 




acl2-asin acl2-atan acl2-acos 



y-^ From first principles 

► Cliain rule and simplification 

♦ Inverse functions 



Figure 1 : Dependency graph of the functions built into def derivative. Symbols leading into a function 
represent how its derivative theorems were proved. 



access to ACL2 state, and ACL2 macros have traditionally not allowed access to state. However, 
such access is now permitted via make-event[^[KMJ. 

The first thing that def derivative does is translate the term to the derived, so that it does not 
contain any macros. Among other things, this replaces terms using + with terms using binary-+. Then, 
the resulting expression is differentiated symbolically. During this process, the necessary proofs are 
collected and laid out using encapsulate. Many of the proofs are done automatically by instantiating 
the relevant composition theorems. The macros know the name of the composition theorems and their 
constrained functions, so they can generate the appropriate hints. Note that this process also involves the 
symbolic computation of the domain of intermediate functions. 

The proofs of these theorems need to be fully automated, since there is no way for the user to give 
hints, as the theorems may be associated with arbitrarily deep subterms of the original formula. We try 
to guarantee this automation by using only minimal theories, usually only the names of the theorems 
generated by derivative-hyps and inverse-hyps. This is one reason why users must conform to 
these naming conventions, even when they prove a derivative fact from first principles, as we did for e^. 

The last thing that def derivative does is to clean up the expressions generated for the derivative 
and domain of the function. This is, perhaps, the biggest weakness of our current implementation. The 
clean-up process is simplistic, consisting mainly of converting binary-+ back to +. We have considered 
invoking ACL2's rewriter at this point, if only to perform arithmetic simplification, e.g., to convert (+ 
(* X 1) (* X l))to (+ X x). But we have not found a satisfactory set of rewrite rules, so we are 
leaving the sophisticated rewrites to the user. 



^What did we do before make-event? 



68 



Implementing an Automatic Differentiator in ACL2 



5 Conclusions 

This paper described the implementation of an automatic differentiation (AD) system for ACL2(r). The 
implementation brought up several points of interest to the ACL2 community. 

First, the idea of using macros to generate theorems according to some pattern is as old as ACL2 
(at least). However, the current work shows how different macros can cooperate by keeping information 
in the ACL2 state. For example, the macro def inv can register information about inverse functions, 
which is subsequently used by the macros def derivative. 

Second, since ACL2 state is now available to macros, the macros can generate code that depends 
on the ACL2 database. For instance, the macros can use the definitions of ACL2 functions to generate 
theorems according to some pattern. 

Third, our approach demonstrates the care that must be taken when designing libraries intended 
for automation. The history of ACL2 and the Boyer-Moore theorem prover includes several examples 
of libraries that are carefully designed so that new theorems can be proved almost automatically. The 
lemmas in these libraries are chosen carefully so that they work well with ACL2's heuristics (and vice 
versa). But when a macro develops a complex theory from arbitrary ACL2 expressions, it becomes 
increasingly likely that some rewrite rules triggered by the ACL2 expression interfere with the proof 
plan of the macro. So the macro has to take careful control of the proof execution, especially if hints 
are involved to instantiate constrained functions. In our experience, we have improved our chances by 
explicitly controlling the active ACL2 theory, and making sure it only has the rewrite rules that we think 
are absolutely necessary for the theorems to prove. We tried to use : by hints in these instantiations, 
so that the proof plan was completely controlled by the hints. Unfortunately, we ran into too many 
cases where lambda expressions created for derivatives did not exactly match the functional instantiation 
generated with a : by hint, so we had to switch to regular : use hints instead. So far, the proof plans are 
succeeding, but we would prefer to have a more robust mechanism. 

Fourth, our solution illustrates how naming conventions can be used to associate facts with functions. 
For example, the fact that square is 1-to-l in its domain can be stored in the theorem square-is-l-to-1 
and the domain itself in the function square-domain-p. Such conventions enable macros to generate 
appropriate hints. Moreover, maintaining those conventions is easy if the macros themselves generate 
the required theorems. Things are more complicated when some of the theorems need to be generated 
by hand, e.g., to show the derivative of e'^. 

Finally, the techniques described in this paper have enabled us to vastly extend the derivative facts 
that have been certified with ACL2(r). As part of this project, we demonstrated from first principles that 
de' / dx = e'. Then we used the macros that we developed to formalize the derivatives of the trigonometric 
functions, and the inverse trigonometric and logarithmic functions. 
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